Vietnamese Text Classification with TextRank and Jaccard Similarity Coefficient

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unilateral Jaccard Similarity Coefficient

Similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various similarity measures are categorized in both syntactic and semantic relationships. In this paper we present a novel similarity, Unilateral Jaccard Similarity Coefficient (uJaccard), which doesn’t only take into consideration the space among two points b...

متن کامل

Efficient Identification of Tanimoto Nearest Neighbors All Pairs Similarity Search Using the Extended Jaccard Coefficient

Tanimoto, or extended Jaccard, is an important similarity measure which has seen prominent use in fields such as data mining and chemoinformatics. Many of the existing state-of-the-art methods for market basket analysis, plagiarism and anomaly detection, compound database search, and ligand-based virtual screening rely heavily on identifying Tanimoto nearest neighbors. Given the rapidly increas...

متن کامل

Kernels and Similarity Measures for Text Classification

Measuring similarity between two strings is a fundamental step in text classification and other problems of information retrieval. Recently, kernel-based methods have been proposed for this task; since kernels are inner products in a feature space, they naturally induce similarity measures. Information theoretic (dis)similarities have also been the subject of recent research. This paper describ...

متن کامل

Detecting Zero-day Polymorphic Worms with Jaccard Similarity Algorithm

Zero-day polymorphic worms pose a serious threat to the security of Mobile systems and Internet infrastructure. In many cases, it is difficult to detect worm attacks at an early stage. There is typically little or no time to develop a well-constructed solution during such a worm outbreak. This is because the worms act only to spread from node to node and they bring security concerns to everyone...

متن کامل

Variations of the Similarity Function of TextRank for Automated Summarization

This article presents new alternatives to the similarity function for the TextRank algorithm for automated summarization of texts. We describe the generalities of the algorithm and the different functions we propose. Some of these variants achieve a significative improvement using the same metrics and dataset as the original publication.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Advances in Science, Technology and Engineering Systems Journal

سال: 2020

ISSN: 2415-6698,2415-6698

DOI: 10.25046/aj050644